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Abstract 

A goal in the kinetic characterization of a macromolecular system is the description of its slow 
relaxation processes, involving (i) identification of the structural changes involved in these pro- 
cesses, and (ii) estimation of the rates or timescales at which these slow processes occur. Most 
of the approaches to this task, including Markov models. Master-equation models, and kinetic 
network models, start by discretizing the high-dimensional state space and then characterize re- 
laxation processes in terms of the eigenvectors and eigenvalues of a discrete transition matrix. The 
practical success of such an approach depends very much on the ability to finely discretize the 
slow order parameters. How can this task be achieved in a high-dimensional configuration space 
without relying on subjective guesses of the slow order parameters? In this paper, we use the 
variational principle of conformation dynamics to derive an optimal way of identifying the "slow 
subspace" of a large set of prior order parameters - either generic internal coordinates (distances 
and dihedral angles), or a user-defined set of parameters. It is shown that a method to identify 
this slow subspace exists in statistics: the time-lagged independent component analysis (TIC A). 
Furthermore, optimal indicators — order parameters indicating the progress of the slow transitions 
and thus may serve as reaction coordinates — are readily identified. We demonstrate that the slow 
subspace is well suited to construct accurate kinetic models of two sets of molecular dynamics sim- 
ulations, the 6-residue fiuorescent peptide MR121-GSGSW and the 30-residue natively disordered 
peptide KID. The identified optimal indicators reveal the structural changes associated with the 
slow processes of the molecular system under analysis. 
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1 Introduction 



Conformational transitions between long-lived, or "metastable" states are essential to the function of 
biomolecules [27} l58l \b2\ 124) \78\ [28l ES] • These rare transitions are ubiquitously found in biomolecu- 
lar processes, including folding [34, 44], complex conformational rearrangements between native pro- 
tein substates [26", "63], and ligand binding [68j. Rare conformational transitions can be explicitly 
traced by either single- molecule experiments [TOl [88l |17l [44] or by high-throughput molecular dynam- 
ics simulations, either realized with few long trajectories [SH [48] or with many shorter trajectories 
[5^ [TT| 1571 [131 [771 [II]- Molecular dynamics (MD) simulations are unique in their ability to resolve 
the dynamics and all structural features of a biomolecule simultaneously. When the sampling problem 
can be overcome and the appropriateness of the force field parameters used is confirmed by accom- 
panying experimental evidence, MD simulations are amongst the most powerful tools to investigate 
conformational transitions in biomolecules. 

A current challenge with high-throughput MD simulations is to extract meaningful information from 
vast trajectory data in an objective way. To achieve this goal, the last few years have seen vast activity 
in the development of computational methods that extract kinetic models from the MD data. Kinetic 
models usually first partition the conformation space into discrete states [93| [6TJ |40l |33l [94] El 
[561 [211 [HQI [69]. Subsequently, transition rates or probabilities can be estimated f45l [561 [SH HSl [90l [801 
[69] . The resulting models are often called transition networks f74l [62l 32j, Diffusion maps fi8\ [76] . 
Master equation models [571 [M], Markov models [64J or Markov state models [861 [16] (MSM), where 
"Markovianity" means that the kinetics are modeled by a memory less jump process between states. 

The recent integration of classical statistical mechanics with modern molecular kinetics highlights 
the crucial role of the eigenvectors and eigenvalues of the Markov model transition matrix or Master 
equation rate matrix. This is because they approximate the exact eigenfunctions and eigenvalues of 
the propagator of the continuous dynamics [79j. The following eigenvalue equation is fundamental to 
conformation dynamics 

V^i = Xi^i (1) 

Here, V is the transfer operator that propagates probability densities of molecular configurations 
[8T1[66], ({){ are its eigenfunctions, and are the associated eigenvalues. Equivalent expressions are 
obtained by expressing the eigenfunctions in different weighted spaced, leading to the transfer operator 
formulation [ST] , or the symmetrized propagator formulation [T4] . Eq. ([T]) is fundamental because when 
solving it for the largest eigenvalues and associated eigenfunctions, all stationary or kinetic quantities 
are defined by them. For example: 

• 7^ is guaranteed to have a unitary stationary eigenvalue and the associated stationary distribution 
/i(x) = (/)i(x); The ensemble average of an observable o can be calculated from o and /i. 

• Experimentally measurable relaxation rates of the system can be computed from the eigenvalues 
as Hii = —T~^ In Ai, or the corresponding timescales as = z^:"^. 

• The metastable states (often referred to as "free energy basins" — although we will avoid this term 
as it would imply the projection onto some pre-defined coordinate set), can be computed from 
the sign structure of the leading eigenfunctions [HI [22j . 

• The structural transition associated to each relaxation timescale is defined by the corresponding 
eigenfunction [72] and corresponds to a transition between metastable sets. This fact can be 
used to assign structural changes to experimentally measurable timescales [65j. 

• Experimentally measurable correlation functions (e.g. fluorescence correlation, intermediate scat- 
tering function in dynamic neutron or X-ray scattering) can be computed as a sum of single- 
exponential relaxations with timescales computed from A^ and amplitudes from the (pi and the 
experimental observable |65l [43} [T5]. 

• From the largest m Eigenvalues and their associated Eigenfunctions, a rank-m propagator can be 
assembled that can describe the dynamics slower than timescale tm [46]. From this propagator, 
many properties can be calculated, such as transition pathways between two sets of configurations 
[5ll[71[67]. 
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The approximation error of all of the above quantities can be cast in terms of the approximation error 
of the eigenvalues and eigenfunctions [79l |23l ITU |7l] . Vice versa, all of the above quantities are easily 
and precisely computable when the eigenvalues and eigenfunctions of V have been approximated with 
high precision. Consequently, any modeling method that attempts to compute the above quantities 
must aim at approximating the eigenvalues and eigenfunctions of 7^ - either explicitly or implicitly. 

Markov models and most of the other aforementioned kinetic models require a discretization of con- 
figuration space to be made. This is typically done by choosing representative configurations by some 
data clustering method, and then partitioning the configuration space by a Voronoi tesselation. In 
contrast to other fields of data analysis, the purpose of clusters is not a classification of configurations, 
but rather a sufficiently fine discretization of configuration space such that the eigenfunctions can be 
well approximated in terms of step functions on the Voronoi cells [72]. In order to achieve this, the 
metric must be chosen such that a fine partition of the relevant "slow" order parameters, i.e. those 
which are good indicators of the slow eigenfunctions (pi. 

How can the slow order parameters be identified without already having a high-precision Markov 
model? It was early noted that a priori order parameters, such as the root mean square distance 
(RMSD) to a single reference structure, the radius of gyration, or pre-selected distances or angles 
are often not good indicators of the slow eigenfunctions, and thus bear the danger of disguising the 
slow kinetics [45l [611 [56]. In order to avoid this, Markov model construction has focused in the last 
years focused on the other extreme - using general metrics that are capable of describing every sort 
of configurational change. Most notable is the minimal RMSD metric, which assigns to each pair of 
configurations their minimal Euclidean distance subject to rigid-body translation and rotation |39| . 
Minimal RMSD has been used successfully in many examples, especially protein folding (see [H l72] 
and references therein). Recent applications include folding of MR121-GSGS-W peptide[72], folding 
of FiP35 WW domain, GTT, NTL9, and protein G [6] and discovery of cryptic allosteric sites in 
/3-lactamase, interleukin-2, and RNase H [lOj. However, minimal RMSD tends to fail in situation 
where the largest-amplitude motions are not the slowest (an example of this is the natively disordered 
KID peptide analyzed below). Principal component analysis (PGA) is a frequently-used method to 
reduce the dimension of an order parameter space by projecting it on its linear subspace of the largest- 
amplitude motions [4J. PGA has also been used successfully in Markov model construction |I67| [65], 
however is suffers from the similar limitations like minimal RMSD, as there is no general guarantee 
that large-amplitude motions are associated with slow transitions. 

It is an important challenge to find a metric that provides a good indicator of the slow processes, such 
that a good approximation of the eigenfunctions (pi is feasible with a moderate number of clusters. 
The aim of this paper is to identify such a method. To be more precise, let ri, r^i G M be a possibly 
large set of d order parameters of a molecular system that are a priori specified by the user. Typical 
examples of order parameter include intramolecular distances and torsion angles. However, complex 
order parameters like the instantaneous dipole moment of a molecule, or an experimentally measurable 
quantity such as a FRET efficiency may also be included. Given this set of order parameters, we aim 
to 

1. Find the linear combination of order parameters that optimally approximates the dominant 
eigenvalues and eigenfunctions, such that a high-precision Markov model can be built in these 
order parameters with direct clustering. 

2. Identify the m order parameters that are best and least redundant indicators for the m dominant 
eigenfunctions, thus providing the user a direct physical interpretation which structural changes 
are associated with the slowest relaxation timescales (Feature selection). 

Here we use the variational principle of conformation dynamics [66j to derive an optimal solution for 
problem 1, and show that an existing extension to PGA solves this problem: time-lagged independent 
component analysis (TIGA) combines information from the covariance matrix and a time-lagged co- 
variance matrix of the data [55j. See [2j for a detailed description of the method. TIGA has recently 
been applied in the analysis of MD data. Naritomi and Fuchigami[57] used TIGA to investigate do- 
main motion of the LAO protein and compared it to PGA. Mitsutake et al [53] used relaxation mode 
analysis, a related technique, to analyze the dynamics of Met-enkephalin. Both studies showed that the 
slow modes were not necessarily associated with large amplitudes, and time-lagged mode analyses were 
thus better suited to detect them than PGA. Here, we demonstrate the usefulness of TIGA coordinates 
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for constructing Markov models for two rather different molecular processes: (i) the conformational 
dynamics of the small fluorescent peptide MR121-GSGSW, for which good Markov models can be 
built using a variety of methods, and of the natively unstructured 30-residue peptide KID, modeled 
through a large ensemble of explicit-solvent molecular dynamics (MD) simulations. 

We also propose a way to approach problem 2, identifying the optimal indicators of the slowest pro- 
cesses. These indicators inform the user of the structural process that is governing the slow relaxations 
of the macromolecule. Optimal indicators help in understanding what comprises the slow kinetics, and 
dramatically the user time to "search" for a structural character of the slow processes from a Markov 
model. 

2 Theory 

We summarize the variational principle of conformation dynamics, stating that the true eigenfunctions 
are best approximated by a Markov model, when the estimated timescales ti are maximized. We 
derive a way to optimally approximate the true eigenfunctions in terms of a linear combination of the 
original order parameters. We then show that this method is identical to the time-lagged independent 
component analysis (TICA) that is an established method in statistics. The TICA problem can be 
easily solved by subsequently solving two simple Eigenvalue problems. 

2.1 Exact dynamics in full configuration space 

We start by providing an expression for the propagator of exact continuous molecular dynamics, 
and show that in order to approximate its long-time behavior, its largest eigenvalues and associated 
eigenfunctions must be well approximated. 

We use Xt to denote the full molecular configuration at time t (if velocities are available, denotes 
a point in full phase space) in state or phase space Q. We assume that the molecular dynamics 
implementation is Markovian in Q (i.e. the time step to x^+r is computed based on the current value 
of Xt only), and gives rise to a unique stationary density /i(x), usually the Boltzmann density: 

/i(x) = Z-ie-^^("). 

where H is the Hamiltonian, Z is the partition function, and (3 = {kBT)~^ is the inverse temperature. 
We also assume that the dynamics are statistically reversible, i.e. that the molecular system is sim- 
ulated in thermal equilibrium. Let us denote a probability density of molecular configurations as pt, 
and let us subsume the action of the molecular dynamics implementation into the propagator V{t). 
The propagator describes the probability that a trajectory that is at configuration x^ at time t will be 
found at a configuration x^^^- a time r later. In an ensemble view, the propagator takes a probability 
density of configurations, pt and predicts the probability density of configurations at later time, pt+r'- 

Pt+T = V{r)pt 

We can write the propagator, by expanding it in terms of its eigenvalues: 

Xi{r) =e~^ 

and its eigenfunctions (/)^, as: 

oo 

Pt+r{y)=V{T)pt{x) = ;^e-^(Vi,pt)</'i. (2) 

where the eigenfunctions <pi (x) take the role of basis functions with which probability densities p can be 
constructed. The first eigenvalue is Ai = 1 and the remaining eigenvalues have a norm strictly smaller 
than 1. Thus, the first timescale is ti = oo and corresponds to the stationary distribution, while 
all other timescales ti are finite relaxation timescales. V^i(x) = /i~^(x)^^(x) are the eigenfunctions 
weighted by the stationary density. Eq. ([2| has a straightforward physical interpretation: the scalar 
product {ijji^ pt) measures the overlap of the starting density pt with the ith eigenfunction and thus 
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determines the amplitude by which this eigenfunction contributes to the dynamics. At any time r, 
the new probabihty density ptJ^r is composed of a set of basis functions With increasing time, the 
contributions of ah basis functions (pi with i > 1 vanish exponentiahy with a timescale given by ti. 
After infinite time r ^ oo, only the first term with = oo (and hence exp(— r/t^) = 1) is left, and 
the stationary density is reached: lim^-^oo = 0i = Stationarity implies that ji will not be 

changed under the action of the propagator: 

v{T)^i = Ai. 

Suppose we are interested in slow timescales r ^ tm+i- At such large times, the dynamics is governed 
by the m largest timescales ti and eigenfunctions of the propagator: 

m 

Pt+T = V{T)pt ^ '^e~~^{^lJi,pt)(})i. 

All kinetic properties at this timescale and all stationary properties can be accurately computed when 
the dominant m eigenvalues and eigenfunctions are approximated. This is our goal. 

2.2 Approximation of slowest timescales and the related eigenfunctions 

We can make a few general statements on how to approximate the true timescales ti and eigenfunctions. 
These general properties can be used to derive a general method that achieves the aim of this paper: 
the identification of the slowest order parameters in a molecule. Since (pi and ipi are interchangeable 
using the weights /i, the approximation problem can be described using either kind of eigenfunction. 
Subsequently we will always refer to the problem of approximating the weighted eigenfunctions ipi. 

Consider some function of the molecular configuration, /(x). From Eq. ([2| we can express the time- 
autocorrelation function of / as a function of r as: 

oo 

Suppose we would know the true eigenfunction V^i(x). It is now easy to show [66] that the time- 
autocorrelation function of V^i(x) yields the exact ith eigenvalue, and thus permits to recover the exact 
zth timescale: 

Ai(r) = (V^i(x^)V^i(x^+^)) = e"^ 
r 

" "ln|A,(r)| 

However, in reality we will not know the exact eigenfunction ipi. Suppose that we would guess a 
model function ^1)2 that is supposed to be similar to V^2- When we make sure that is appropriately 
normalized, the variational principle of conformation dynamics [66j shows that the time-autocorrelation 
function of ?/^2 approximates the true eigenvalue, and the true timescale from below: 

(Vi2(x^)Vi2(x^+^)) < e"^ 

h < t2 (4) 

where equality only holds for 1^2 = V^2- Thus, we have a recipe for finding an optimal approximation 
to the second timescale and its associated eigenfunction: We must seek a function 1^2 that has the 
maximum timescale ^2- 

Similar inequalities can be shown for the other eigenvalues and timescales ^3, t^. We can show that 
if one proposes a model function tpi that is orthogonal to the eigenfunctions 1 through i — 1, we also 
have: 

ii<ti. (5) 

This variational principle of conformation dynamics is analogous to the variational principle in quantum 
mechanics. 
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2.3 Best approximation of the eigenfunctions 



What is the relation of the variational principle above to Markov models? Since the eigenfunctions ifji 
are initially unknown and difficult to guess, it is reasonable to approximate them by functions ijji that 
are assembled from a linear combination of basis functions 

n 
/c=l 

which must be defined a priori, and the optimization problem then consists of finding the optimal 
parameters hik that we will denote by vectors G M"^, where we have chosen the dimension of the 
basis set, n, to be equal to the number of basis functions. The Ritz method [75j provides the optimal 
set of coefficients for an orthonormal basis set. Formally, if we define the covariance matrix between 
Ansatz functions as: 

4(T)=Et[xi(xt)xi(xt+0] 
And we require that the basis functions are orthogonal — which is equivalent to them being uncorrelated 
at lag time 0: 

(Xi, Xi)^ = E* [Xi(xt)Xi (x*)] = 4 (0) = 5ij (7) 

then the optimal set of coefficients is then given by the eigenvectors of the following Eigenvalue 
problem: 

&{T)hi = hX{T) (8) 

Let us now consider the more general case that the Ansatz functions are not orthonormal, i.e. (xi, Xj) 
Sij. In this situation we must first orthonormalized the basis coordinates before. This is done via a 
generalization to Eq. (|8|. For a non-orthonormal basis set, the optimal approximation to the true 
eigenvalues and eigenfunctions is obtained by solving the generalized eigenvalue problem: 

C^(r)b, = C^(0)b,A,(r) (9) 

One may formally rewrite Eq. ^ as D(r)b^ = A^b^ where D(r) = (C^(0))~-^C^(r) is an orthonormal 
basis set. Numerically, the matrix inversion of C^(0) is often poorly conditioned and should therefore 
be avoided. The results ([8| and |9| are well known from variational calculus. The appendix contains 
an illustrative derivation of Eq. ([91 relevant to the special choice of basis set used in this paper. 



2.4 Optimal linear combination of input order parameters 

Based on the above results we can now formulate a method to find a linear combination of molecular 
order parameters r = (ri(x), r(i(x)) that best resolves the slow relaxation processes. This is done 
by finding the optimal coefficients for Eq. ([6|. For this, we define the Basis function Xi to be identical 
to the mean- free coordinate r^(x) (if the original order parameters r^(x) are not mean- free, then we 
simply subtract the mean: r^(x) = r-(x) — (r^(x))): 

X,(x) = n(x) (10) 

Thus, our basis set has n = d dimensions. Now let us compute the correlation matrix of normalized 
order parameters as: 

</T)=Et[r,(xt)r,(x,+,)]=4(r) 

Then solving Eq. ([9| with the correlation matrix for lag times and r will provide us with the linear 
combination of input order parameters that optimally approximates the exact propagator eigenfunc- 
tions. See Appendix for a sketch of the usual derivation of Eq. ([9| for the case of TICA. It turns 
out that Eq. (|9| with the choice of coordinates ( 10 ) is known as the time-lagged independent compo- 



nent analysis (TICA) in statistics |55][57]. A robust algorithm to solve Eq. ([9| is known as AMUSE 
algorithm [47j and will be given below. 

The Eigenfunction approximations via Eq. (|6| using the coefficients b^ are the optimal approximation 
to the true eigenfunctions and will give an optimal approximation of the timescales. As a result of the 
variational principle, A2(r) < A2(r) and 

Ur) = --^<t2 
lnA2(r) 
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according to |4] Since the true eigenfunctions are generally nonlinear functions of the original order 
parameters, and the basis set used in Eq. (10) is linear in the original order parameters, it cannot 
be expected that ^ true, and therefore the variational principle can at this point not be 

extended to further timescales than ^2- In other words, the TIC A timescales ts,...,^^ may be both 
under- or overestimated. 



2.5 Markov models and implied timescales 

We do not intend to use the TICA timescales directly, but rather use the TICA subspace in order to 
construct a Markov model by finely discretizing this space. What can be said about the timescales 
of the resulting Markov model? We can use the variational principle summarized above to bound 
the timescales of a Markov model. Classical Markov models operate by assigning a configuration x 
uniquely to one of the geometric clusters used to construct them. It can be shown [79j that this 
operation is equivalent to use the basis functions 



l^(x) 



i.e. each basis function z is a step function with has a constant value on the configurations belong- 
ing to the ith cluster and is zero elsewhere. This basis is an orthonormal basis set: (Xi^Xj)fi = 
TT^^ /xg5 ~ ^'^j' "^^^^^ direct Ritz method applies and as shown in [66], Eq. (p| becomes: 

T(r)r, = r,A,(r) (11) 

where T(r) is the Markov model transition matrix and R = [ri, ...,r^] are its right eigenvectors. To 
relate Eq. (8) and (11) we have used the definition C^(r) = ./^Tij(r), i.e. the covariance matrix 



between Ansatz functions x is the symmetrized transition matrix as given in [14J. 

Thus, a Markov model is the Ritz method for the choice of a step- function basis on the clusters 
used to build it, and thus gives an optimal step- function approximation to the eigenfunctions and 
maximal eigenvalues amongst all choices of functions that can be supported by the clustering. It 
follows from Eq. Q that at least the second timescale will then be underestimated. When the Markov 
model is sufficiently good in approximating the slowest processes, all of the first m timescales will be 
underestimated as given by Eq. (d). It was shown [23] that this estimation error becomes smaller 
when r is increased. Prinz et al [71j showed that it decreases with r~^. As a result, when plotting the 
estimated timescales £i(r) as a function of r one obtains the well-known implied timescale plots shown 
in Figs. ^ and Q, where the estimated timescales {^(r) slowly converge to the true timescale when 
r is increased. 

We have now seen that both the TICA eigenvalue A2 and the corresponding timescale ^2 are underesti- 
mated, as well as the Markov model eigenvalue A2 and the corresponding timescale ^2. Unfortunately, 
we cannot make a rigorous statement of how ^2 and ^2 are related to each other. However, we can 
make the ad hoc statement that we intend to cluster the dominant TICA subspace "sufficiently fine". 
Thereby the Markov model step functions of the dominant TICA component allows the nonlinear 
eigenfunction V^2(x) to be approximated better than by the linear combination of order parameters 



(10) directly. For example, it is typical that the eigenfunction V^2(x) stays almost constant over a large 
part of configuration space and then changes abruptly to a different level in the transition state [81 ] [72 ] . 
Such a behavior can be much better described by a step function than by a linear fit. Therefore, we 
shall here assume that the estimates of the dominant timescale as t2 < ^2 < ^2- The dominant TICA 
timescale ^2 is a lower bound to the true timescale ^2, but typically a poor lower bound. The Markov 
model timescale ^2 is typically larger, and thus a better estimate of the true timescale ^2- This concept 
is illustrated in Fig. [T] 



3 Methods 

Having identified the "slow" linear combinations of input order parameters, the hope is that clustering 
in a low-dimensional linear subspace will provide a useful clustering metric for the accurate and efficient 
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Dominant TICA parameter ^2(x) = y^62fc?^fc(x) 



(b) 



^2 (upper bound) 



t2 



t2 



TICA 



^MSM 



Figure 1: Scheme illustrating different approximations to the dominant eigenfunction iIj2 of the Molec- 
ular dynamics propagator, and the associated approximations to the slowest relaxation timescale ^2- 
TICA (blue) approximates the eigenfunction ?/^2 (black) as a linear combination of molecular observ- 
ables and the TICA timescale t2 associated to the TICA eigenvalue A2 underestimates (usually strongly) 
the true timescale ^2- The estimate is then improved by building a Markov model in TICA space which 
approximates the eigenfunction ?/^2 by a step function (red) that is constant on the Markov model clus- 
ters. The corresponding Markov model estimate of the relaxation timescale, ^27 is thus typically larger 
than the TICA timescale t2 and a better estimate of the true timescale ^2- 



construction of Markov models with a moderate number of clusters. Here, we compare the performance 
of different cluster metrics which are briefly described in the present section. 

There are many software packages available for performing data clustering. For clustering and Markov 
model construction of molecular dynamics data, the packages EMMA [83], MSMbuilder [5], Wordom 
^82j and METAGUI [Sj are currently available. Here, we use the EMMA package. 

3.1 Clustering methods and partitioning of state space 

Clustering methods for discretizing MD trajectory data can be divided into two categories: 

1. explicit coordinate methods that treat molecular coordinates as elements of an explicit vector 
space. The MD data is projected into the chosen coordinate set and then clustered by some 
distance metric (e.g. Euclidean distance) in that space. 

2. pure metric methods that have no explicit vector space available, but rather a metric that 
measures the distance between pairs of molecular conformations. The clustering algorithm groups 
only existing configurations via this distance metric. 

Coordinates used in (1) include Cartesian coordinates (provided there is a meaningful coordinate origin, 
for example defined by the largest molecule or domain in the system). For example, in Refs. |3Ql [T3], 
the binding of a small ligand was analyzed by using the three-dimensional positions of the ligand with 
respect to the protein. Other frequently used coordinate sets are intramolecular coordinates such as 
dihedral angles [62j or inter- atomic distances. Alternatively to directly clustering the primary set of 
coordinates, coordinate transforms can be applied to preprocess the data, with the hope of identifying 
sub-spaces in which the clustering will be more informative. Below we will discuss the transformations 
PCA and TICA in detail. 

A metric often used in (2) is the normalized Euclidean metric after rigid-body translation and rotation 
has been removed, in short the minimal RMSD (or least RMSD) metric [39j. Minimal RMSD is 
an established metric for the analysis of MD trajectories and MSM construction and used here as a 
reference. 

Discretization of trajectory data is performed using clustering algorithms. In a first stage the trajectory 
is explored and n representative points of the coordinate space are selected as cluster centers with a 
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clustering method. Various algorithms have been proposed in the literature, including /c- means [50], 
/c-centers [20j, /c-medoids [42j, regular spatial clustering[83j, regular temporal clustering[83j and Ward 
clustering[6j. The /c- means algorithm requires the coordinate space to be a vector space (in order to 
compute the mean) whereas the other aforementioned algorithms only require a metric. 

Subsequent to the identification of cluster centers, the state space is partitioned by assigning each 
trajectory frame to its closest cluster center according to the same metric used for clustering. The 
discretization obtained this way is a Voronoi tessellation of the observed coordinate space. Voronoi 
cells form a complete partition of the conformation space. 



3.2 Principal component analysis (PCA) 

PC A is a linear transform that transforms coordinates in such a way that their instantaneous corre- 
lations vanishes. It is frequently used in the MD community in order to identify the linear subspace 
in which the largest-amplitude motions occur, with the hope that these large- amplitude motions are 
most informative of functionally relevant transitions [4]. 

Let r G be a vector of order parameters used, for example distances or Cartesian positions. Without 
restriction of generality we assume that r is mean-free, i.e. the mean of the data has already been 
subtracted. Note that r is generally only a subset of the full phase space coordinates, thus is a 
subset of Q. For example, in protein simulation r usually only contains the protein coordinates, but not 
those of the solvent. The covariance matrix of the order parameters r is defined by the elements: 

while the estimator for trajectory data containing N discrete time steps is: 



1 ^ 



The elements c[^- are covariances between different order parameters if i ^ j and autocovariances if 
i =j. 

Principal components (PCs) are uncorrelated variables y that are obtained via an orthonormal trans- 
form of the original order parameters r. For this, the eigenvectors of the correlation matrix are 
obtained: 

in matrix form, with the eigenvector matrix W = [wi,...,Wf^] and the matrix of variances 5]^ = 
diag(cr?,..,cr^). 

In order to transform an original coordinate vector r into principal components, we perform: 

yT = (12) 

Usually, principal components are sorted according to their autocovariance af. If af decays rapidly 
with z, one often selects a threshold and ignores all PCs with smaller erf. This is done by using some 
G ]R^><^ matrix, which only contains the dominant m < d column vectors of W. Used in this 
way, PCA is a tool for dimension reduction. In the present paper we use PCA in two ways: (1) as 
a direct dimension reduction tool to yield a subspace for clustering and subsequent Markov model 
construction, and (2) to transform the original data into the full set of principal components via Eq. 



(12), thus arriving at a decorrelated coordinate set as an input for the subsequent transform (see 
subsequent section). This way of using PCA is called whitening the data [Ij. 

PCA is often used to analyze MD data, and has also been employed in the construction of Markov 
models. We used PCA to reduce the dimension of Pin WW protein simulations in order to build 
a protein folding Markov model with a lagtime of only r = 2 ns [67J. Stock and co-workers have 
explored the possibility of using dihedral angles as an input to PCA (dPCA). As angular coordinates, 
they cannot be averaged in the same way like nonperiodic coordinates. Ref. [3j thus suggests to use 
the sine and cosine of backbone dihedral angles as input coordinates. It is found for hepta-alanin 
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that small-amplitude PCs are unimodal while large-amplitude PCs have multimodal distributions and 
thus contain the interesting conformation dynamics. Performing /c-means clustering on the PCs did 
produce states with high metastability. However, it has proven difficult to analyze proteins with several 
secondary structure elements using dPCA. In subsequent work [35j, the approach was extended to the 
"dihedral PCA by parts" method. 



3.3 Time-lagged independent component analysis (TICA) 

Like PCA, TICA [55j uses a linear transform to map the original order parameters r{t) to a new set 
of order parameters z(t) — the independent components (ICs). Unlike PCs, ICs have to fulfill two 
properties: 

1. they are uncorrelated and 

2. their autocovariances at a fixed lag time r are maximal. 
The time-lagged covariance matrix C^(r) is defined by: 

clj{T) = {n{t)r,{t + r)) 
and the estimator for trajectory data containing N time steps is given by: 

N-T 



The elements of C^(r) are time-lagged autocovariances if i = j and time-lagged cross covariances 
if i ^ j. As shown in the Appendix, this matrix is symmetric under the assumption of reversible 
dynamics and in the limit of good statistics. For a finite dataset, symmetricity must be enforced. 

We seek a transformation matrix U = [ui, ...,U(^] that diagonalizes C^(0) (to fulfill property 1), and 
maximizes the autocorrelations cf^(r) = uf {r)ui for every column of U (to fulfill property 2). 
As described in Sec. |2.4[ this is accomplished by solving: 



C'^(t)u, = C'-(0)uiAi(T), (13) 
Eq. (13) is equivalent to Eq. (|9|. See appendix for an illustrative derivation of (13). As described in 



Sec. |2.4[ the second-largest estimated eigenvalue is a lower bound for the real second-largest propagator 
eigenvalue: A2(r) < A2(r). 

ICs are now ordered according to the magnitude of the auto covariance A^(r), and the ICs with the 
largest autocovariances Xi{r) will be called dominant. Since the dominant m ICs yield the linear sub- 
space in which most of the slow processes are contained, it is reasonable to now perform a direct cluster- 
ing in this subspace, thus aiming at approximating the nonlinear behavior of the slowest m eigenfunc- 
tions with step functions. This will yield a better approximation to the m slowest timescales. Rewriting 



Eq. (13) in matrix form, with and the matrix of autocorrelations A(r) = diag(Ai(r), .., A(^(r)) yields: 

C^(r)U = C^(0)UA(r). (14) 
In order to transform an original coordinate vector r into independent components, we perform: 

= r^U (15) 



How can (14) be solved? If C^(0) or C^(r) were invertible the generalized eigenvalue problem could be 
transformed into a normal eigenvalue problem. But as we expect some of our original order parameters 
to be highly correlated, the determinants of and C^(r) will be nearly zero, prohibiting this option. 



Alternatively, one can seek the solution of (13) via generalized eigensolvers. 



However, there is a simple and efficient alternative to this: Problem (14) can also be solved by solving 



two simple eigenvalue problems using the AMUSE algorithm [47J. It consists of the following steps: 
1. Use PCA to transform mean- free data r{t) into principal components y{t). 
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2. Normalize principal components: y' (t) = S) ^y(t). 

3. Compute the symmetrized time-lagged covariance matrix C'^ = ^[C^ + (C^ j"*"] of the normal- 
ized PCs. 

4. Compute an eigenvalue decomposition of C'^ , obtaining eigenvector matrix V and project the 
trajectory y' (t) onto the dominant eigenvectors to obtain z(t). 

This only works when the eigenvectors of C'^ are uniquely defined, i.e. if the eigenvalues are not 
degenerated [Ij. The main idea of this algorithm is, that properties (1) and (2) can be fulfilled one 
after the other. First, steps 1 and 2 use PC A to produce decorrelated and normalized trajectories y'(t)^ 
also known as whitening the data. Then steps 3 and 4 maximize the time lagged autocovariances. 
Because the matrix V which is used in step 4 is unitary, it preserves scalar products between the 
vectors y'{t). Now if y'{t) are chosen to be uncorrelated (and properly normalized) then also z(t) will 
be uncorrelated. 

In summary, the transformation Eq. ( p!5| can be written as a concatenation of three linear transforms: 

z^(t) = r^(t)U = r^(t)W5]-^V. (16) 

TICA will be used as a dimension reduction technique. Only the dominant TICA components will be 
used to construct a Markov model. 

3.4 Markov model construction 

Markov models are constructed by first performing a data clustering using an appropriate metric 
described in the results section (using the EMMA command mm_ cluster) and subsequently converting 
the molecular dynamics trajectory files into discrete trajectory files containing the sequence of cluster 
indexes visited (using the EMMA command mm_assign). For the sake of the current paper, the main 
analysis is the behavior of the relaxation timescales that are implied by the estimated Markov model 
(EMMA command mm_timescales). 

All Markov model estimation is done as proposed in [75], using the maximum probability estimator 
of reversible transition matrices with a weak neighbor prior count matrix (EMMA default). Let us 
call the transition matrix T(r), then it has the right eigenvectors -0^, the left eigenvectors (f)^ and the 
eigenvalues according to the following Eigenvalue equations: 

T(t)V;, = \i{T)'^i 
4>fT{T) = \{T)<i>^ 

We order eigenvalues by descending norm. When T(r) is connected (irreducible), it will have a 
unique eigenvalue of norm 1. The corresponding eigenvector can be normalized to yield the stationary 
distribution tt: 

TT^ = 7r^T(r). 

Since T(r) fulfills detailed balance, the left and right eigenvectors are related by: 

0i = diag(7r)'0^ 

The estimated (implied) relaxation timescales of the Markov model are given by 

r 

u — - — . 

In A(r) 

which are - ignoring statistical errors - related to the true relaxation timescales by U < ti (see theory 
section), and are typically larger than the timescales implied by the TICA eigenvalues (see theory 
section and Fig. [T]). 
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3.5 Optimal indicators 



Given the final Markov model transition matrix T(r), we can now establish a simple way to quantify 
how well each of the order parameters Vk serving as an input serves as an indicator of the slow 
process described by the eigenvector We simply compute the correlation between all pairs of order 
parameters and eigenvectors, and then, for each eigenvector, choose those order parameters that have 
a maximum correlation: 

r.Mi) = .r,m.J^^:^^zMM. (17) 



The averages in Eq. (17) can be computed either via Markov model states (in this case the average 
value of r/c, is computed for every microstate j, obtaining fkj, and the correlation is given by {rk^l^^) = 
X]j=i ^j^kji^ij)' Here we instead choose to evaluate the averages as a time average over all trajectory 
data. In this case, the eigenvector coordinate is given by the microstate each trajectory frame is 
associated to. 



4 Results 

The proposed methodology is demonstrated on two different peptide systems: the fluorescent peptide 
MR121-GSGSW and the 30-residue natively unstructured peptide KID. 

MR121-GSGSW is a well-studied fluorescent peptide that has been extensively characterized by ex- 
periments [59j, simulations [19j and also Markov models [60^, "65]. Here, a data set of two explicit 
solvent simulations of 3 /j.s each is used that is publicly accessible as a benchmark dataset for the 



EMMA software package (see http://simtk.org/home/emma). The details of the simulation setup are 
described in [72] . 

The slowest relaxation timescale of the MR121-GSGSW data set has been estimated to be between 20 
and 30 ns, and it has been found that the slowest processes are dominated by the interaction between 
MR121 and the tryptophan residue ^j. The data set is used as a benchmark system to test whether 
Markov model construction in PCA or TICA coordinates manage to identify the slow parameters, 
approximate the slow processes, and assign the correct timescales. 

Fig. [2]A.l shows a sample structure of MR121-GSGSW. Fig [2^ shows a benchmark for the relaxation 
timescales computed by a regular-space clustering in pairwise minimal RMSD metric using 1000 cluster 
centers. The slowest processes are found at about 25 ns, 12 ns and 8 ns, slightly larger — and thus more 
accurate according to the variational principle in Eq. Q — than by the coarser Markov model in [65]. 
To set up the direct clustering, two internal coordinate sets are considered: (i) the set of 66 distances 
between 12 coordinates defined by the 5 C^s and the 7 ring centers involved, and (ii) the center 
position and the orientation vector coordinates of the tryptophan sidechain in a coordinate set defined 
by the MR121 principal axes (see Fig. [2]A.2 for an illustration). Fig. |2]:l-3 show the results of direct 
k-means clustering with 1000 cluster centers in the space of 66 intramolecular distances (CI), only the 
9 tryptophan coordinates (C2), and the combined set (C3). It is clearly seen that the intramolecular 
distances are not suited to resolve the slowest processes, while the tryptophan coordinates resolve 
them very well. This can be understood from the structural arrangements shown in Fig. [3] that are 
dominated by the relative orientation of the tryptophan sidechain with respect to the MR121 ring 
system. Especially the slowest process, the stacking-order exchange of the two ring systems, cannot 
be well described by the intramolecular distances that are similar when the tryptophan is "above" or 
"below" the MR121. Fig. |2p3 shows that discretizing the combined coordinate set resolves the slowest 
processes with similar timescales as in the 9 Trp-coordinate set alone. This is not always expected, as 
increasing the dimensionality of the space to be clustered while keeping the number of clusters constant 
will often reduce the resolution. 

In the subsequent PCA and TICA analysis different linear subspaces of the combined coordinate 
set were considered. Interestingly, clustering the principal components reduces the quality of the 
Markov model significantly. This is explained by the fact that the largest-amplitude motions in the 
present system is the transition between structures in which Trp and MR121 are in contact, and 
open structures. However, open structures have a very low population, giving rise to a rather fast 
timescale of the opening/closing process. The slowest processes, involving different arrangements and 
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orientations of the Trp and MR121 while being in contact, give rise to comparatively small amplitude 
motions. Using one and four PC A components (Dl and D2), the three slowest processes are not found. 
Using ten PCA components, the two slowest processes are found, although slightly underestimated, 
while the third-slowest process is not found. 

Fig. [2^1-3 shows that the TICA coordinates perform indeed very well. Using only the single slowest 
TICA coordinate does resolve the slowest process well and gives rise to a timescale of 20-25 ns, close to 
the expected value. Using the four slowest TICA coordinates resolves the two slowest processes well, 
while somewhat underestimating the third process. With ten TICA coordinates all slow processes 
are well resolved, and the timescales are found to be 27 ns, 13 ns, and 10 ns at a lagtime of r =10 
ns — slightly larger than in any of the other choices of metrics. 

Fig. [3]A. illustrates the structural transition involved in the two slowest processes occurring at around 
27 and 13 ns computed from the ten-dimensional TICA Markov model. We display the 1000 mi- 
crostates in a visualization that we shall call kinetic map, where the coordinates are given by the two 
slowest left eigenvectors 02, 4>^. For example, a cluster i is drawn at a position (02,2, ^3, i) with a size 
proportional to its stationary probability tt^. The map is termed "kinetic", because similar positions 
in eigenvector spaces mean that the states can relatively quickly reach one another, while distant 
positions only exchange on timescales t2 on the horizontal, and on timescale ts on the vertical axis. 
The left eigenvectors are chosen instead of the right eigenvectors, because the left eigenvectors are 
weighted by the stationary distribution: = TViipk^i- Thus, points on the border of the map tend to 
have larger stationary probability. Therefore, the extremal points are at the same time populous and 
kinetically distant, and can roughly be associated with the most stable "free energy minima", while the 
smaller clusters connecting them correspond to transition states. The structures, shown for the most 
populous and kinetically distinct clusters, indicate that the slowest relaxations are associated with a 
stacking-order exchange of the MR121 and Trp groups, and a rotation of the Trp group with respect 
to the MR121 group (see "marker" atom shown as a blue sphere). 

Fig. [3p illustrates the optimal indicators of the slowest processes, i.e. the input order parameters 
that have the largest correlation with the individual right Markov model eigenvectors ^ind 
The correlation plots show that the respective order parameters attain clearly different values at the 
end-states of the transition, i.e. for the minimal and maximal values of the respective eigenvector. At 
intermediate values of the eigenvectors, i.e. transition states, the order parameter can access many 
different values. This is easily seen in the slowest process (Fig. [3]3l), where the best indicator is the 
Trp z-position that mediates the stacking order exchange (correlation coefficient 0.84 with the second 
eigenvector 02)- While the value of the Trp z-position is clearly defined in the transition end-states, 
where the Trp is located "above" and "below" the MR121 moiety, the transition states include open 
configurations where the Trp and the MR121 are not in contact at all, and therefore all values of the 
z-position are accessible in these states. A similar behavior is seen for the second-slowest process (Trp 
sidechain rotation). 

We now turn to another molecular system. Here, an extensive set of simulations of the kinase inducible 
domain (KID) in explicit solvent were investigated. KID is part of the cAMP response element-binding 
protein (CREB). CREB is a transcription factor involved in processes as important as glucose reg- 
ulation and memory, and it binds the CREB-binding protein (CBP), a well-known cancer-related 
molecular hub with around 300 interacting protein partners [41]. KID belongs to a large and impor- 
tant class of intrinsically unstructured peptides (lUP), encompassing many hormones, domains and 
even whole proteins [91]. Unstructured regions perform their function even though they lack a well- 
defined secondary or tertiary structure in solution. Although standardized algorithms exist to detect 
unstructured regions on the basis of the primary amino acid sequence, the structural details of how dis- 
ordered regions exert their function is still elusive. For example, some unstructured domains, including 
KID, become folded upon binding [89]; it is therefore of much interest (e.g. for the druggability of 
protein-protein interactions) to investigate whether the presence of pre-formed elements causes folded 
conformations to be selected from the ensemble (conformational selection) [54J , or whether the binding 
rather occurs through induced- fit mechanics [85j. 

To shed light on this problem, we set-up an ensemble of all- atom simulations of the phosphorylated 
KID (pKID) domain. We have performed 7706 all-atom explicit-solvent simulations of 24 ns each 
using the ACEMD software [29] on the GPUGRID distributed computing platform [12j, yielding a 
total 168 jiis simulation data. However, due to the short simulations, only short lagtimes could be 
used, presenting a challenge to the Markov model construction. The detailed simulation setup is 
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described in the Appendix. 

Fig. |4] shows the performance of different metrics in their abiUty to resolve the slowest processes of 
KID. KID is a more difficult case than the MR121-GSGSW peptide because its natively unstructured 
nature gives rise to many fast large-amplitude motions which will conceal the slow processes in most 
ad hoc metrics. Fig. [4^ and C show that neither regular-space clustering in minimal pairwise RMSD 
metric, nor direct clustering in all distances yield a converged estimated of the timescales up to lagtimes 
of 10 ns. Between these two, regular-space RMSD is better, reaching a timescale of about 170 ns at 
r = 10 ns, while the direct clustering produces a timescale below 100 ns at r = 10 ns. Higher choices 
of lagtimes were avoided as they lead to a severe reduction of the usable data because the connected 
set of clusters drops significantly below 100% after that point. Fig. |4)l)l-3 show that the performance 
of principal components is even worse than direct clustering, giving rise to timescale estimates below 
20 ns for one principal component and below 50 ns for ten principal components. This confirms that 
the largest-amplitude motions are not the slowest in KID. 

Fig. |4pl-3 show the performance of the TICA coordinate using one, four, and ten dimensions. Using 
only the slowest TICA coordinate, a slow process of >200 ns is found, that has not been resolved by 
the clustering in any of the other metrics, however this timescale does not converge for lagtimes up to 
10 ns. Using only the four slowest TICA coordinates, there are already three processes resolved that 
are above 100 ns, and the convergence behavior improves. Using the ten slowest TICA coordinates, 
five processes slower than 100 ns are resolved. The slowest process converges to a timescale around 
220 ns and does so already at a lagtime r of 2-5 ns. Thus, the lagtime needed is a factor of 50-100 
smaller than the timescale of the process, indicating a very good discretization of the corresponding 
process. 

Fig. [5]A. illustrates the structural transitions associated with the two slowest relaxation processes of 
KID as identified by the Markov model using ten-dimensional TICA model. We have decided to 
focus on the two slowest processes around 200 and 220 ns relaxation time, because they are somewhat 
separated from the next-slowest processes occurring at around 100 ns. As the peptide has great 
structural variability it is of little value to plot all relevant structures. Therefore, we have plotted 
the positions of the microstates again in a kinetic map, using the coordinates of the two dominant 
left eigenvectors 02, 03. It is seen that at the slowest timescales, the system rearranges between 
mostly open and disordered structures (left), structures with one helix folded or partially folded (top 
right), and hairpin-like structures (bottom right). Thus the system has some residual helical structure, 
although it is not very stable in absence of a stabilizing binding partner. 

Fig. [5^ illustrates the optimal indicators of the slowest processes, i.e. the input order parameters that 
have the largest correlation with the resulting right Markov model eigenvectors ^ind Like for 
MR121-GSGSW, the correlation plots show that the respective order parameters are mainly able to 
distinguish the end-states of the transition, but unlike for MR121-GSGSW, multiple Ca — Ca distances 
are almost equally good indicators for the same process. Fig. [5^ shows correlation plots of the best 
indicators of 02 '03 5 t)ut indicates the five best correlations (all correlation coefficients above 0.7) 
in the structure. It is seen that the slowest process (timescale 220 ns) is best described by a hinge 
opening and closing, where the closed hinge appears to induce at least partial formation of the N- 
terminal helix (red, see Fig. |5^2). This is consistent with NMR experiments that have shown the 
N-terminal region to be approximately 50% helical in the apo-form [73j. The second-slowest process 
(timescale 200 ns) is best described by partial helix formation of the C-terminal part (blue) of the 
protein. 

5 Discussion 

In the present manuscript we have derived a method to find the optimal linear combination of input 
coordinates for approximating the slowest relaxation processes in complex conformational rearrange- 
ments of molecules. It is shown that an implementation for this method is already known in statistics 
as the TICA method, which is combined here with Markov modeling in order to construct models of 
the slow relaxation processes and precise estimates of the related relaxation timescales. It is shown 
that this approach of constructing Markov models yields slower timescales, and thus a more precise 
approximation to the true relaxation processes, than previous approaches. This is also achieved for the 
natively unstructured peptide KID where established approaches such as direct clustering in distance 
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space, minimal-RMSD-based clustering, or clustering in PCA space did not perform well because the 
largest-amplitude motions were not good indicators of the slowest relaxation processes. 

Beyond having an approach to construct quantitatively accurate Markov models in a way that is more 
robust than most previous approaches, we readily obtain a way to find best indicators of the slowest 
transitions. Best indicators are those molecular order parameters that are best correlated with the 
Markov model Eigenvectors describing the slowest processes, and thus serve as candidates for good 
reaction coordinates. Being able to point out such indicators provides a way to make the sometimes 
complex structural rearrangements readily understandable. 
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Figure 2: MR121-GSGSW peptide and its dominant relaxation timescales calculated via different 
Markov model construction methods. (Al) Sample structure of the peptide. (A2) Illustration of 
the Trp coordinates used. The center position of the Trp and the orientation vectors are given in 
a coordinate system defined by the MR121 principal axes. (B) Relaxation timescales using regular 
space RMSD clustering with approximately 1000 clusters. (C-E) Relaxation timescales using /c-means 
with 1000 clusters and Euclidean metric but operating on different subspaces: (CI) Intramolecular 
distances between all C^s and ring centers. (C2) Center position and orientation coordinates of the 
Trp moiety in the MR121 coordinate system. (C3) Combined coordinate set including intramolecular 
distances and Trp coordinates. (Dl-3) Dominant PC A subspace of the combined coordinate set using 
1, 4, and 10 dimensions. (El-3) Dominant TIC A subspace of the combined coordinate set using 1, 4, 
and 10 dimensions. 
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Figure 3: (A) Kinetic map of the two slowest relaxation processes of MR121-GSGSW (around 27 ns 
and 13 ns) calculated from the Markov model shown in Fig [2^3. The grey discs mark the coordinates of 
the 1000 microstates in the space of the left eigenvectors 02, (p^. The slowest relaxation of the system 
thus takes place on the horizontal axis, the second-slowest one on the vertical axis, and distances are 
associated with kinetic separation. The area of a disc is proportional to the stationary probability of 
the corresponding microstate. Some representative (kinetically distant and populous) microstates are 
shown as molecular structures. (B) Optimal indicators of the slow processes. The scatter plots show 
the correlation between the second and third right Markov model eigenvectors ? 03 ^ind the order 
parameters most correlated with them. The arrows in the structures show the optimal indicators. 
(Bl) The Trp z-position mediates the stacking order exchange and has a correlation coefficient of 0.84 
with the second eigenvector 02 (timescale 27 ns). (B2) The smallest Trp axis of inertia mediates 
the rotation of the side-chain and has a correlation coefficient of 0.59 with the third eigenvector 03 
(timescale 13 ns). 
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Figure 4: KID peptide and its estimated dominant relaxation timescales using different Markov model 
construction methods. (A) Sample structure of KID. (B) Relaxation timescales using regular space 
RMSD clustering with 1000 clusters. (C-E) Relaxation timescales using /c-means with 1000 clusters and 
Euclidean metric but operating on different subspaces. (C) All Ca — Ca distances. (Dl-3) Dominant 
PC A subspace of Ca — Ca distances using 1, 4, and 10 dimensions. (El-3) Dominant TIC A subspace 
of Ca — Ca distances using 1, 4, and 10 dimensions. 
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Figure 5: (A) Kinetic map of the two slowest relaxation processes of the KID peptide (around 200 ns 
and 220 ns) calculated from the Markov model shown in Fig|4^3. The grey discs mark the coordinates 
of the 1000 microstates in the space of the left eigenvectors 02, 4>^. The slowest relaxation of the system 
thus takes place on the horizontal axis, the second-slowest one on the vertical axis, and distances are 
associated with kinetic separation. The area of a disc is proportional to the stationary probability 
of the corresponding microstate. Some representative (kinetically distant and populous) microstates 
are shown as molecular structures. (B) Optimal indicators of the slow processes. The scatter plots 
show the correlation between the second and third right Markov model eigenvectors '02 7 03 ^ind the 
order parameters most correlated with them. The colored lines show all five best indicators, all having 
correlation coefficients with the respective eigenvectors of 0.7 or greater. The slowest process may 
thus be described as opening / closing of the hinge between the two helical domains of KID (timescale 
220 ns), while hinge-closing is associated with at least partial N-terminal helix formation (red). The 
second-slowest process may be described as partial helix formation in the "blue" region (timescale 200 
ns). 
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Appendix 



Derivation of TICA 

The generahzed Eigenvalue problem of Eq. (|9| , and more specifically the TICA problem can be derived 
in different ways. It goes back to the classical Ritz method [75j and can be found in many mathematical 
texts. In the following we sketch a standard derivation using variational calculus, see also |36[ |2] for a 
thorough discussion of this approach. 

Let r G be a vector of coordinates used, for example distances or Cartesian positions. Without 
restriction of generality we assume that r is mean-free, i.e. the mean of the data has already been 
subtracted. Note that r is contains generally only a subset of the full phase space coordinates, thus 
is a subset of ft. 

We now seek new coordinates z G as a linear transformation of r such that 

1. z are uncorrelated 

2. the autocovariances of z at a fixed lag time r are maximal. 
We will show that if coordinates z are given by a weighted sum of r 

d 

Zi{r) = ^Uikrk (18) 
the weight coefficients have to fulfill the generalized eigenvalue problem (see theory section) 

C^(r)u, = A,C^(0)u, (19) 
where C^(r) is the time-lagged covariance matrix that is defined by: 

4(r) = (r,(%(t + r)) (20) 

To prove this we rewrite the covariance matrix of z and the time-lagged covariance matrix of z using 



the defining equations (18), and (20). 



= {zi{t)zj{t)) = ^UikUji{rk(t)ri(t)) = ^ 2x^/^2x^7 4^(0) 

k,l k,l 

Cij(^) = {zi{t)zj{t^T)) = ^UikUji{rk{t)ri{t ^ t)) = ^UikUjicliir) 

k,l k,l 

We wish to maximize c^jir) (property 3) under the constraint, that cf^(0) = Sij (property 2). We 
start by computing one coordinate zi with maximal auto covariance. It is given by the weighted sum 
Zi = J UiUjClj{0), where we used the shorthand notation Ui = Uu. The constraint (2) for Zi is now 

cfi(o) = E,,i«i«icr,(o) = i. 

Since the matrix-elements c^j (0) are fixed, the autocovariance cfi (r) can be treated as a differentiable 
function of the coefficients Ui. Therefore, we need to maximize the function 

\k,l J \ k,l 

where A is the Lagrange multiplier. We perform the maximization by setting the partial derivatives of 
F with respect to the weight coefficients to zero. 



: 



Rearranging and rewriting this equation in matrix- vector form leads to (19) for i = 1. The same 
argument is used for the subsequent eigenvalues. We now prove that the solutions of (19) fulfill the 
properties requested above: 
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1. The IC's obtained by solving (19) are uncorrelated: Let be a generalized eigenvector 
with eigenvalue and let uj be a generalized eigenvector with eigenvalue Xj ^ A^. Then the 
orthogonality condition 

uf C'-(0)u,- = % (21) 

will hold if C^(0) and C^(r) are symmetric matrices. If and Uj are used as the weights in 

(18) this is equivalent to cf^ (0) = dij. 

Proof: 

AiC^(0)ui • u^- = C^(r)ui • u^- = • C^(r)u^- = • A^C^(0)u^- = A^C^(0)ui • u^- 

Therefore = (A^ — Aj)(uf C^(O)uj). Because A^ ^ Xj the orthogonality condition must hold. 
This does not hold, if eigenvectors are degenerate i.e. A^ = Xj for some z, j. However degeneracy 
can be avoided by changing the lag time r such that no eigenvalues with large magnitude coincide. 
Solutions with smaller eigenvalues might still be degenerate, but this is unproblematic since these 
solutions are discarded for clustering. In addition, "fast" modes will necessarily be uncorrelated 
with "slow" modes, because their eigenvalues are far apart. 

2. The autocovariances at a fixed lag time r are maximal: 

We show that the autocovariances are identical to the Lagrange multipliers, and thus to the 
eigenvalues in Eq. [T9j 

cUt) = XAj (22) 



To see this, multiply (19) with uj from the left, to obtain 

4(r) = uJC^(r)u, = A,uJC^(0)u, 

now use the orthogonality condition ( [2T| 

= = uJC^(r)ui = A^^^^- 

To show that the optimum found is indeed a maximum, we calculate the Hessian of the con- 
strained autocovariance cli{r). Its elements are: 

i^. = ^=cL(r)-A,cL(0) 

and show that it is a positive definite matrix 

x^Hx = x^C^(r)x - Aix^C^(0)x < Vx 

We first expand x in the basis of the generalized eigenvectors x = Ui(u^ • x) = u^q and 
use equations ( [21] ) and ( [22| ) 

^ c,c,uf C'-(r)u, - Ai ^ Qc,uf C'-(0)u, = ^ cfX, - Ai ^ cf 



Without loss of generality, we assume that the solution vectors of Eq. (19) were sorted by 
descending eigenvalues A^ to obtain an ordering from "slow" modes to "fast" modes, Ai > A2 > 
. . . > Xm- From this follows c^Xi — Ai c| < for the first solution ui. The second solution 
is restricted to a subspace that is orthogonal to Ui according to (21) and (22) 

x^C^(0)ui = x^C^(r)ui = 

Therefore we can ignore the coefficient ci in the development of x and obtain the quadratic form 

^c^^Ai - A2^c,^ 

2=2 i=2 

Again, this is negative, because A2 is the largest eigenvalue in the sum. This procedure can be 
extended to the third, fourth,... eigenvalue, showing that the optima are minima for all solutions. 



As a result, we can sort the solution vectors of Eq. (19) by descending eigenvalues A^ to obtain an 
ordering from "slow" modes to "fast" modes. 
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Symmetricity and Symmetrization of the time-lagged covariance matrix 



Consider the correlation matrix of mean- free coordinates r for lag time r: 

<i(r) = {n{t)rj{t + T)) 
and the correlation matrix for lag time r: 



cor[,(r) = 



(^) {ri{t)rj{t + T)) 



//■ 



dxdy xy p{ri {t) = x, Vj {t ^ r) = y) 



where = y{t-\-r) = y) is the unconditional transition probability between the set = {r^ = x} 

and the set ^2 = {rj = y} within time lag r. In statistically reversible dynamics, such an unconditional 
set-transition probability is symmetric (this follows directly from integrating the detailed balance 
condition /i(x)p^(x | y) = /i(y)Pr(y I x) over the sets). Thus, we can exchange time indexes and show: 

cor[^- (r) = / / dxdy xy p{ri + r ) = x, rj {t) = y) 

J X J y 

dydx yx p{rj {t) = y , {t -\- r) = x) 



II' 

J y J X 

cor^^(r). 



And then trivially 



<,-(r) = <,.(T) Vr 



When estimating correlation or covariance matrices from simulations, one cannot expect Cij = Cji to 
hold. A trivial method is to use ^ 

Cij(^) = i^{hj(r) ^ Cji(T)) 

where Cij{r) is the simulation estimate. 
Simulation setup, KID 

The coordinates of the phosphorylated KID domain (28 residues, CREB residues 119-146) were ex- 
tracted from chain B of the entry IKDX deposited in the Protein Data Bank. The entry represent 
the folded configuration of the pKID-CBP bound structure, determined through NMR [89j. Neutral 
acetylated and N-methyl caps were added to avoid artifactual charges at the peptide's termini; the 
protein was solvated with 6572 water molecules and a 85 mM KCl concentration (matching the exper- 
imental ionic strength). The system was then parametrized with the AMBER ff99SB-ILDN forcefield 
[49] : water and ions were modeled respectively with the TIP3P and Joung-Cheatham parameter sets 
[38', 'ST]. The system thus prepared was first equilibrated for 24 ns in the constant-pressure ensemble, 
during which it stabilized at a volume of approximately 60 A^. The peptide was then denatured by 
heating it at 500 -K for 17.6 ns in constant- volume conditions; 176 frames were extracted from this 
trajectory and used as starting configurations for the production runs. All of the simulations were per- 
formed with a time step of 4 fs, enabled by the hydrogen mass repartitioning scheme [25j; long-range 
electrostatic forces were computed with the particle- mesh Ewald summation method. A nonbonded 
cutoff distance of 9 A was used with a switching distance of 7.5 A for Van der Waals interactions, 
while the lengths of bonds involving hydrogen atoms were constrained with the SHAKE algorithm. 

A set of 7706 production runs was executed on the GPUGRID distributed computing network ^12]. 
Each simulation was performed in the constant- volume ensemble at 315 -K for 24 ns with the same 
parametrization used for equilibration, storing structural snapshots every 100 ps. Each production 
simulation begun either from one of the configurations visited during the denaturation run, or frames 
visited during preceding production trajectories. Starting frames were selected iteratively with an 
adaptive strategy in order to minimize the statistical uncertainty on the largest eigenvalue, computed 
on the already available simulation data, based on Singhal and Pande's algorithm [31j. 
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